Generable
I don’t work in infectious disease modeling, instead I analyze:
I’ve always been interested in different modeling approaches. There is a ton of value in cross-disciplinary pollination.
I work in multiple programming languages:
This is a 3-compartment ODE model, in which a person at any one time is either:
Image source: https://mc-stan.org/users/documentation/case-studies/boarding_school_case_study.html
Each of these states is termed a “compartment”, and we model the size of each compartment at time \(t\) as the number of individuals in this state at that time.
It is described by a system of ODEs:
\[ \begin{aligned} \frac{dS}{dt} &= -\beta S \frac{I}{N}\\ \frac{dI}{dt} &= \beta S \frac{I}{N} - \gamma I \\ \frac{dR}{dt} &= \gamma I \end{aligned} \]
The number of infections in any day or week will depend on the sizes of these compartments, and:
\[ R_0 = \frac{\beta}{\gamma} \]
This number is called the basic reproduction number, reflecting the expected number of new infections from a single infected person in a population of susceptible people.
src: https://en.wikipedia.org/wiki/Measles
Before we get started (if you want to follow along):
install_cmdstan()We are going to start by implementing the ode function body in Stan code:
functions {
real[] sir(real t, real[] y, real[] theta,
real[] x_r, int[] x_i) {
real beta = theta[1];
real gamma = theta[2];
real S = y[1];
real I = y[2];
real R = y[3];
real P = S + I + R;
vector[3] dy_dt;
dy_dt[1] = -beta * I * S / P;
dy_dt[2] = beta * I * S / P - gamma * I;
dy_dt[3] = gamma * I;
return(dy_dt);
}
}We can now write a minimal Stan program that will allow us to simulate different population dynamics.
We write all our methods as functions, so we can re-use them later.
We will save this in a file: sir-simulation.stan
functions {
real[] sir(real t, real[] y, real[] theta,
real[] x_r, int[] x_i) {
real beta = theta[1];
real gamma = theta[2];
real S = y[1];
real I = y[2];
real R = y[3];
real N = S + I + R;
real dy_dt[3];
dy_dt[1] = -beta * I * S / N;
dy_dt[2] = beta * I * S / N - gamma * I;
dy_dt[3] = gamma * I;
return(dy_dt);
}
real[,] simulate_sir(real[] t, data int N, data int i0, real beta, real gamma) {
int N_t = num_elements(t);
real y0[3] = {N - i0 + 0.0, i0 + 0.0, 0.0};
real t0 = 0;
real theta[2] = {beta, gamma};
real y[N_t, 3];
y = integrate_ode_rk45(sir, y0, t0, t, theta, rep_array(0.0, 0), rep_array(0, 0));
return(y);
}
}
model {
}We can now use the rstan package to expose these as R functions.
This is a powerful tool for model interrogation; the functions are written in Stan code, transpiled to C++, and exposed to R through Rcpp.
library(rstan)
expose_stan_functions('model/sir-simulation.stan')
t <- seq(from = 0, to = 25, by = 0.1)
sim_df <- simulate_sir(t = t,
N = 1000,
i0 = 1,
beta = 4,
gamma = 1/4
) %>%
map(set_names, c('S', 'I', 'R')) %>%
map_dfr(as_tibble_row, .id = 't_index') %>%
mutate(t_index = as.integer(t_index)) %>%
left_join(tibble(t = t) %>% mutate(t_index = row_number()))Of course, the data that are observed are rarely this clean.
Let’s assume we have data for daily counts of cases, and that we will model this as a negative binomial observation model.
We add a new function to our data simulation for the observed data:
Now, we can update our simulation to include both the asssumed “true” population incidence and the observed counts:
expose_stan_functions('model/sir-simulation.stan')
case_t <- seq(from = 0, to = max(t), by = 1)
sim_cases <- simulate_cases_rng(t = case_t,
N = 1000,
i0 = 1,
beta = 4,
gamma = 1/4,
phi = 10,
) %>%
tibble(cases = .) %>%
bind_cols(t = case_t)We first develop and evaluate our model using simulated data.
functions {
real[] sir(real t, real[] y, real[] theta,
real[] x_r, int[] x_i) {
real beta = theta[1];
real gamma = theta[2];
real S = y[1];
real I = y[2];
real R = y[3];
real N = S + I + R;
real dy_dt[3];
dy_dt[1] = -beta * I * S / N;
dy_dt[2] = beta * I * S / N - gamma * I;
dy_dt[3] = gamma * I;
return(dy_dt);
}
real[,] simulate_sir(real[] t, data int N, data int i0, real beta, real gamma) {
int N_t = num_elements(t);
real y0[3] = {N - i0 + 0.0, i0 + 0.0, 0.0};
real t0 = -0.1;
real theta[2] = {beta, gamma};
real y[N_t, 3];
y = integrate_ode_rk45(sir, y0, t0, t, theta, rep_array(0.0, 0), rep_array(0, 0));
return(y);
}
int[] simulate_cases_rng(real[] t, data int N, data int i0, real beta, real gamma, real phi) {
int N_t = num_elements(t);
real y[N_t, 3] = simulate_sir(t, N, i0, beta, gamma);
int cases[N_t] = neg_binomial_2_rng(y[,2], phi);
return(cases);
}
}
data {
int N_t;
real t[N_t];
int cases[N_t];
int N;
int i0;
}
parameters {
real<lower = 0> beta;
real<lower = 0> gamma;
real<lower = 0> phi_inv;
}
transformed parameters {
real phi = 1. / phi_inv;
real ys[N_t, 3] = simulate_sir(t, N, i0, beta, gamma);
}
model {
// priors
phi_inv ~ exponential(5);
beta ~ normal(0, 5);
gamma ~ normal(0.4, 0.5);
// likelihood
cases ~ neg_binomial_2(ys[,2], phi);
}
generated quantities {
int yrep[N_t] = neg_binomial_2_rng(ys[,2], phi);
real log_lik[N_t];
for (i in 1:N_t)
log_lik[i] = neg_binomial_2_lpmf(cases[i] | ys[i,2], phi);
}
Formatting posterior estimates takes some care
Once formatted, we can use tidybayes to summarize flexibly:
There is a canonical dataset describing a particularly virulent flu outbreak at a boarding school in 1978. Later, it was determined that the flu was H1N1, however this was not known during the event.
We are given the following in the data description:
The index case was infected by 1978-01-10, and had febrile illness from 1978-01-15 to 1978-01-18. 512 boys out of 763 became ill.
It’s worth noting that the minimal date in our data is: 1978-01-22.
These data are missing details of the index case. We will hard-code this information as our initial conditions.
We will also ignore the “convalescent” category, for now, and lump them in with the “recovered” group (neither infectious nor susceptible).
# prepare data
flu_N_t<- 763
date0 <- min(influenza_england_1978_school$date)
flu_t <- as.integer(influenza_england_1978_school$date - date0)
flu_cases <- influenza_england_1978_school$in_bed
flu_data <- list(N = flu_N_t,
i0 = 1,
N_t = length(flu_t),
t = flu_t,
cases = flu_cases)
# sample model
flu_fit <- basic_sir$sample(data = flu_data,
seed = params$seed, refresh = 0)Formatting posterior estimates takes some care
library(tidybayes)
fit_draws <- flu_fit %>%
gather_draws(yrep[i], ys[i, state_index]) %>%
ungroup() %>%
left_join(tibble(t = flu_t, cases = flu_cases) %>% mutate(i = row_number())) %>%
left_join(tibble(state = c('S', 'I', 'R')) %>%
mutate(state_index = row_number())) %>%
mutate(state = if_else(is.na(state), .variable, state)) %>%
select(state, .value, .chain, .iteration, .draw, cases, t) %>%
spread(state, .value)Now, we can compare the posterior estimates to our observed data:
The assumptions of the basic SIR model do not apply to most pandemics.
It is a starting point.
Formatting posterior estimates takes some care
library(tidybayes)
fit_draws <- covid_fit %>%
gather_draws(yrep[i], ys[i, state_index]) %>%
ungroup() %>%
left_join(tibble(t = covid_t, cases = covid_cases) %>% mutate(i = row_number())) %>%
left_join(tibble(state = c('S', 'I', 'R')) %>%
mutate(state_index = row_number())) %>%
mutate(state = if_else(is.na(state), .variable, state)) %>%
select(state, .value, .chain, .iteration, .draw, cases, t) %>%
spread(state, .value)As an example, consider this 8-compartment model used to describe COVID-19 disease dynamics.
The observed data include daily counts of Case, Hospitalization, and Death counts. Each is modeled using a negative binomial given the estimated true prevalence.
src: Keller, JP, Zhou, T, Kaplan, A, Anderson, GB, Zhou, W. Tracking the transmission dynamics of COVID-19 with a time-varying coefficient state-space model. Statistics in Medicine. 2022; 41( 15): 2745- 2767. doi:10.1002/sim.9382
In addition, their model includes particular covariate effects on time-varying infectiousness parameters.
For example, one informative covariate was the degree of mobility (from aggregate cell-phone tracking data)
Keller, JP, Zhou, T, Kaplan, A, Anderson, GB, Zhou, W. Tracking the transmission dynamics of COVID-19 with a time-varying coefficient state-space model. Statistics in Medicine. 2022; 41( 15): 2745- 2767. doi:10.1002/sim.9382